- Introduction
- Dataset
- 3D U-Net Model
- Metrics
- Soft Dice Loss
- Create and Train the Model
- Evaluation
- Conclusion
Introduction
In an earlier project I developed a deep learning model that could predict Alzheimer's disease using 3D MRI medical images. In this project we will build a deep learning model to automatically segment tumor regions in brain using MRI (Magnetic Resonance Imaging) scans.
The MRI scan is one of the most common image modalities that we encounter in the radiology field.
Other data modalities include:
We will cover:
- What is in an MR image
- Standard data preparation techniques for MRI datasets
- Metrics and loss functions for segmentation
- Visualizing and evaluating segmentation models
Dataset
What is an MRI?
Magnetic resonance imaging (MRI) is an advanced imaging technique that is used to observe a variety of diseases and parts of the body.
As we will see later, neural networks can analyze these images individually (as a radiologist would) or combine them into a single 3D volume to make predictions.
At a high level, MRI works by measuring the radio waves emitting by atoms subjected to a magnetic field.
In this project, we'll build a multi-class segmentation model. We'll identify 3 different abnormalities in each image: edemas, non-enhancing tumors, and enhancing tumors.
MRI Data Processing
We often encounter MR images in the DICOM format.
- The DICMOM format is the output format for most commercial MRI scanners. This type of data can be processed using the pydicom Python library.
We will be using the data from the Decathlon 10 Challenge. This data has been mostly pre-processed for the competition participants, however in real practice, MRI data needs to be significantly pre-preprocessed before we can use it to train our models.
Exploring the Dataset
Our dataset is stored in the NifTI-1 format and we will be using the NiBabel library to interact with the files. Each training sample is composed of two separate files:
The first file is an image file containing a 4D array of MR image in the shape of (240, 240, 155, 4).
- The first 3 dimensions are the X, Y, and Z values for each point in the 3D volume, which is commonly called a voxel.
- The 4th dimension is the values for 4 different sequences
- 0: FLAIR: "Fluid Attenuated Inversion Recovery" (FLAIR)
- 1: T1w: "T1-weighted"
- 2: t1gd: "T1-weighted with gadolinium contrast enhancement" (T1-Gd)
- 3: T2w: "T2-weighted"
The second file in each training example is a label file containing a 3D array with the shape of (240, 240, 155).
- The integer values in this array indicate the "label" for each voxel in the corresponding image files:
- 0: background
- 1: edema
- 2: non-enhancing tumor
- 3: enhancing tumor
We have access to a total of 484 training images which we will be splitting into a training (80%) and validation (20%) dataset.
Let's begin by looking at one single case and visualizing the data.
We'll use the NiBabel library to load the image and label for a case. The function is shown below to give you a sense of how it works.
# set home directory and data directory
HOME_DIR = "data/BraTS-Data/"
DATA_DIR = HOME_DIR
def load_case(image_nifty_file, label_nifty_file):
# load the image and label file, get the image content and return a numpy array for each
image = np.array(nib.load(image_nifty_file).get_fdata())
label = np.array(nib.load(label_nifty_file).get_fdata())
return image, label
We'll now visualize an example. For this, we use a pre-defined function we have written in the util.py file that uses matplotlib to generate a summary of the image.
The colors correspond to each class.
- Red is edema
- Green is a non-enhancing tumor
- Blue is an enhancing tumor.
image, label = load_case(DATA_DIR + "imagesTr/BRATS_003.nii.gz", DATA_DIR + "labelsTr/BRATS_003.nii.gz")
image = util.get_labeled_image(image, label)
util.plot_image_grid(image)
We have a previously written a utility function which generates a GIF that shows what it looks like to iterate over each axis.
image, label = load_case(DATA_DIR + "imagesTr/BRATS_003.nii.gz", DATA_DIR + "labelsTr/BRATS_003.nii.gz")
util.visualize_data_gif(util.get_labeled_image(image, label))
Data Preprocessing using Patches
While our dataset is provided to us post-registration and in the NIfTI format, we still have to do some minor pre-processing before feeding the data to our model.
Generate sub-volumes
We are going to first generate "patches" of our data which you can think of as sub-volumes of the whole MR images.
- The reason that we are generating patches is because a network that can process the entire volume at once will simply not fit inside our current environment's memory/GPU.
- Therefore we will be using this common technique to generate spatially consistent sub-volumes of our data, which can be fed into our network.
- Specifically, we will be generating randomly sampled sub-volumes of shape [160, 160, 16] from our images.
- Furthermore, given that a large portion of the MRI volumes are just brain tissue or black background without any tumors, we want to make sure that we pick patches that at least include some amount of tumor data.
- Therefore, we are only going to pick patches that have at most 95% non-tumor regions (so at least 5% tumor).
- We do this by filtering the volumes based on the values present in the background labels.
Standardization (mean 0, stdev 1)
Lastly, given that the values in MR images cover a very wide range, we will standardize the values to have a mean of zero and standard deviation of 1.
- This is a common technique in deep image processing since standardization makes it much easier for the network to learn.
Let's define a function to get a sub volume/patch.
def get_sub_volume(image, label,
orig_x = 240, orig_y = 240, orig_z = 155,
output_x = 160, output_y = 160, output_z = 16,
num_classes = 4, max_tries = 1000,
background_threshold=0.95):
"""
Extract random sub-volume from original images.
Args:
image (np.array): original image,
of shape (orig_x, orig_y, orig_z, num_channels)
label (np.array): original label.
labels coded using discrete values rather than
a separate dimension,
so this is of shape (orig_x, orig_y, orig_z)
orig_x (int): x_dim of input image
orig_y (int): y_dim of input image
orig_z (int): z_dim of input image
output_x (int): desired x_dim of output
output_y (int): desired y_dim of output
output_z (int): desired z_dim of output
num_classes (int): number of class labels
max_tries (int): maximum trials to do when sampling
background_threshold (float): limit on the fraction
of the sample which can be the background
returns:
X (np.array): sample of original image of dimension
(num_channels, output_x, output_y, output_z)
y (np.array): labels which correspond to X, of dimension
(num_classes, output_x, output_y, output_z)
"""
# Initialize features and labels with `None`
X = None
y = None
tries = 0
while tries < max_tries:
# randomly sample sub-volume by sampling the corner voxel
start_x = np.random.randint(0, orig_x - output_x + 1)
start_y = np.random.randint(0, orig_y - output_y + 1)
start_z = np.random.randint(0, orig_z - output_z + 1)
# extract relevant area of label
y = label[start_x: start_x + output_x,
start_y: start_y + output_y,
start_z: start_z + output_z]
# One-hot encode the categories.
# This adds a 4th dimension, 'num_classes'
# (output_x, output_y, output_z, num_classes)
y = keras.utils.to_categorical(y, num_classes=num_classes)
# compute the background ratio
bgrd_ratio = np.sum(y[:, :, :, 0])/(output_x * output_y * output_z)
# increment tries counter
tries += 1
# if background ratio is below the desired threshold,
# use that sub-volume.
# otherwise continue the loop and try another random sub-volume
if bgrd_ratio < background_threshold:
# make copy of the sub-volume
X = np.copy(image[start_x: start_x + output_x,
start_y: start_y + output_y,
start_z: start_z + output_z, :])
# change dimension of X
# from (x_dim, y_dim, z_dim, num_channels)
# to (num_channels, x_dim, y_dim, z_dim)
X = np.moveaxis(X, -1, 0)
# change dimension of y
# from (x_dim, y_dim, z_dim, num_classes)
# to (num_classes, x_dim, y_dim, z_dim)
y = np.moveaxis(y, -1, 0)
# take a subset of y that excludes the background class
# in the 'num_classes' dimension
y = y[1:, :, :, :]
return X, y
# if we've tried max_tries number of samples
# Give up in order to avoid looping forever.
print(f"Tried {tries} times to find a sub-volume. Giving up...")
### Test
get_sub_volume_test(get_sub_volume)
We'll now look at the enhancing tumor part of the label.
image, label = load_case(DATA_DIR + "imagesTr/BRATS_001.nii.gz", DATA_DIR + "labelsTr/BRATS_001.nii.gz")
X, y = get_sub_volume(image, label)
# enhancing tumor is channel 2 in the class label
# change indexer for y to look at different classes
util.visualize_patch(X[0, :, :, :], y[2])
Next, lets define a function that given a patch (sub-volume), standardizes the values across each channel and each Z plane to have a mean of zero and standard deviation of 1.
def standardize(image):
"""
Standardize mean and standard deviation
of each channel and z_dimension.
Args:
image (np.array): input image,
shape (num_channels, dim_x, dim_y, dim_z)
Returns:
standardized_image (np.array): standardized version of input image
"""
# initialize to array of zeros, with same shape as the image
standardized_image = np.zeros(image.shape)
# iterate over channels
for c in range(image.shape[0]):
# iterate over the `z` dimension
for z in range(image.shape[3]):
# get a slice of the image
# at channel c and z-th dimension `z`
image_slice = image[c,:,:,z]
# subtract the mean from image_slice
centered = image_slice - np.mean(image_slice)
# divide by the standard deviation (only if it is different from zero)
if np.std(centered) != 0:
centered_scaled = centered / np.std(centered)
# update the slice of standardized image
# with the scaled centered and scaled image
standardized_image[c, :, :, z] = centered_scaled
return standardized_image
### test
standardize_test(standardize, X)
X_norm = standardize(X)
util.visualize_patch(X_norm[0, :, :, :], y[2])
3D U-Net Model
Now let's build our model. In this project we will be building a 3D U-net.
This architecture will take advantage of the volumetric shape of MR images and is one of the best performing models for this task.
Dice Similarity Coefficient
Aside from the architecture, one of the most important elements of any deep learning method is the choice of our loss function.
A natural choice with is the cross-entropy loss function.
- However, this loss function is not ideal for segmentation tasks due to heavy class imbalance (there are typically not many positive regions).
A much more common loss for segmentation tasks is the Dice similarity coefficient, which is a measure of how well two contours overlap.
- The Dice index ranges from 0 (complete mismatch)
- To 1 (perfect match).
In general, for two sets $A$ and $B$, the Dice similarity coefficient is defined as: $$\text{DSC}(A, B) = \frac{2 \times |A \cap B|}{|A| + |B|}.$$
Here we can interpret $A$ and $B$ as sets of voxels, $A$ being the predicted tumor region and $B$ being the ground truth.
Our model will map each voxel to 0 or 1
- 0 means it is a background voxel
- 1 means it is part of the segmented region.
In the dice coefficient, the variables in the formula are:
- $x$ : the input image
- $f(x)$ : the model output (prediction)
- $y$ : the label (actual ground truth)
The dice coefficient "DSC" is:
$$\text{DSC}(f, x, y) = \frac{2 \times \sum_{i, j} f(x)_{ij} \times y_{ij} + \epsilon}{\sum_{i,j} f(x)_{ij} + \sum_{i, j} y_{ij} + \epsilon}$$
- $\epsilon$ is a small number that is added to avoid division by zero
Let's define a function for the dice coefficient.
def single_class_dice_coefficient(y_true, y_pred, axis=(0, 1, 2),
epsilon=0.00001):
"""
Compute dice coefficient for single class.
Args:
y_true (Tensorflow tensor): tensor of ground truth values for single class.
shape: (x_dim, y_dim, z_dim)
y_pred (Tensorflow tensor): tensor of predictions for single class.
shape: (x_dim, y_dim, z_dim)
axis (tuple): spatial axes to sum over when computing numerator and
denominator of dice coefficient.
Hint: pass this as the 'axis' argument to the K.sum function.
epsilon (float): small constant added to numerator and denominator to
avoid divide by 0 errors.
Returns:
dice_coefficient (float): computed value of dice coefficient.
"""
intersection = K.sum(y_true * y_pred)
dice_numerator = 2 * intersection + epsilon
dice_denominator = K.sum(y_true) + K.sum(y_pred) + epsilon
dice_coefficient = dice_numerator / dice_denominator
return dice_coefficient
### test
# test with a large epsilon in order to catch errors.
# In order to pass the tests, set epsilon = 1
epsilon = 1
sess = K.get_session()
single_class_dice_coefficient_test(single_class_dice_coefficient, epsilon, sess)
Dice Coefficient for Multiple Classes
Now that we have the single class case, we can think about how to approach the multi class context.
- Remember, we want segmentations for each of the 3 classes of abnormality (edema, enhancing tumor, non-enhancing tumor).
- This will give us 3 different dice coefficients (one for each abnormality class).
- To combine these, we can just take the average. We can write that the overall dice coefficient is:
$$DC(f, x, y) = \frac{1}{3} \left ( DC_{1}(f, x, y) + DC_{2}(f, x, y) + DC_{3}(f, x, y) \right )$$
- $DC_{1}$, $DC_{2}$ and $DC_{3}$ are edema, enhancing tumor, and non-enhancing tumor dice coefficients.
For any number of classes $C$, the equation becomes: $$DC(f, x, y) = \frac{1}{C} \sum_{c=1}^{C} \left ( DC_{c}(f, x, y) \right )$$
In this case, with three categories, $C = 3$
Lets implement the mean dice coefficient.
def dice_coefficient(y_true, y_pred, axis=(1, 2, 3),
epsilon=0.00001):
"""
Compute mean dice coefficient over all abnormality classes.
Args:
y_true (Tensorflow tensor): tensor of ground truth values for all classes.
shape: (num_classes, x_dim, y_dim, z_dim)
y_pred (Tensorflow tensor): tensor of predictions for all classes.
shape: (num_classes, x_dim, y_dim, z_dim)
axis (tuple): spatial axes to sum over when computing numerator and
denominator of dice coefficient.
Hint: pass this as the 'axis' argument to the K.sum function.
epsilon (float): small constant add to numerator and denominator to
avoid divide by 0 errors.
Returns:
dice_coefficient (float): computed value of dice coefficient.
"""
intersection = K.sum(y_true * y_pred, axis=axis)
dice_numerator = 2 * intersection + epsilon
dice_denominator = K.sum(y_true, axis=axis) + K.sum(y_pred, axis=axis) + epsilon
dice_coefficient = K.mean(dice_numerator / dice_denominator)
return dice_coefficient
### test
# test with a large epsilon in order to catch errors.
# In order to pass the tests, set epsilon = 1
epsilon = 1
sess = K.get_session()
dice_coefficient_test(dice_coefficient, epsilon, sess)
Soft Dice Loss
While the Dice Coefficient makes intuitive sense, it is not the best for training.
- This is because it takes in discrete values (zeros and ones).
- The model outputs probabilities that each pixel is, say, a tumor or not, and we want to be able to backpropagate through those outputs.
Therefore, we need an analogue of the Dice loss which takes real valued input. This is where the Soft Dice loss comes in. The formula is:
$$\mathcal{L}_{Dice}(p, q) = 1 - \frac{2\times\sum_{i, j} p_{ij}q_{ij} + \epsilon}{\left(\sum_{i, j} p_{ij}^2 \right) + \left(\sum_{i, j} q_{ij}^2 \right) + \epsilon}$$
- $p$ is our predictions
- $q$ is the ground truth
- In practice each $q_i$ will either be 0 or 1.
- $\epsilon$ is a small number that is added to avoid division by zero
The soft Dice loss ranges between
- 0: perfectly matching the ground truth distribution $q$
- 1: complete mismatch with the ground truth.
You can also check that if $p_i$ and $q_i$ are each 0 or 1, then the soft Dice loss is just one minus the dice coefficient.
Lets mplement the soft dice loss.
def soft_dice_loss(y_true, y_pred, axis=(1, 2, 3),
epsilon=0.00001):
"""
Compute mean soft dice loss over all abnormality classes.
Args:
y_true (Tensorflow tensor): tensor of ground truth values for all classes.
shape: (num_classes, x_dim, y_dim, z_dim)
y_pred (Tensorflow tensor): tensor of soft predictions for all classes.
shape: (num_classes, x_dim, y_dim, z_dim)
axis (tuple): spatial axes to sum over when computing numerator and
denominator in formula for dice loss.
Hint: pass this as the 'axis' argument to the K.sum function.
epsilon (float): small constant added to numerator and denominator to
avoid divide by 0 errors.
Returns:
dice_loss (float): computed value of dice loss.
"""
intersection = K.sum(y_true * y_pred, axis=axis)
dice_numerator = 2. * intersection + epsilon
dice_denominator = K.sum(K.square(y_true), axis=axis) + K.sum(K.square(y_pred), axis=axis) + epsilon
dice_loss = 1 - K.mean(dice_numerator / dice_denominator)
return dice_loss
### test
# test with a large epsilon in order to catch errors.
# In order to pass the tests, set epsilon = 1
epsilon = 1
sess = K.get_session()
soft_dice_loss_test(soft_dice_loss, epsilon, sess)
model = util.unet_model_3d(loss_function=soft_dice_loss, metrics=[dice_coefficient])
base_dir = HOME_DIR + "processed/"
with open(base_dir + "config.json") as json_file:
config = json.load(json_file)
# Get generators for training and validation sets
train_generator = util.VolumeDataGenerator(config["train"], base_dir + "train/", batch_size=3, dim=(160, 160, 16), verbose=0)
valid_generator = util.VolumeDataGenerator(config["valid"], base_dir + "valid/", batch_size=3, dim=(160, 160, 16), verbose=0)
model.load_weights(HOME_DIR + "model_pretrained.hdf5")
Patch-level Predictions
When applying the model, we'll want to look at segmentations for individual scans (entire scans, not just the sub-volumes)
- This will be a bit complicated because of our sub-volume approach.
- First let's keep things simple and extract model predictions for sub-volumes.
- We can use the sub-volume which we extracted at the beginning
util.visualize_patch(X_norm[0, :, :, :], y[2])
X_norm_with_batch_dimension = np.expand_dims(X_norm, axis=0)
patch_pred = model.predict(X_norm_with_batch_dimension)
Convert prediction from probability into a category
Currently, each element of patch_pred is a number between 0.0 and 1.0.
- Each number is the model's confidence that a voxel is part of a given class.
- We will convert these to discrete 0 and 1 integers by using a threshold.
- We'll use a threshold of 0.5.
- In real applications, we would tune this to achieve your required level of sensitivity or specificity.
# set threshold.
threshold = 0.5
# use threshold to get hard predictions
patch_pred[patch_pred > threshold] = 1.0
patch_pred[patch_pred <= threshold] = 0.0
Now let's visualize the original patch and ground truth alongside our thresholded predictions.
print("Patch and ground truth")
util.visualize_patch(X_norm[0, :, :, :], y[2])
plt.show()
print("Patch and prediction")
util.visualize_patch(X_norm[0, :, :, :], patch_pred[0, 2, :, :, :])
plt.show()
Sensitivity and Specificity
The model is covering some of the relevant areas, but it's definitely not perfect.
- To quantify its performance, we can use per-pixel sensitivity and specificity.
Recall that in terms of the true positives, true negatives, false positives, and false negatives,
$$\text{sensitivity} = \frac{\text{true positives}}{\text{true positives} + \text{false negatives}}$$
$$\text{specificity} = \frac{\text{true negatives}}{\text{true negatives} + \text{false positives}}$$
Let's write a function to compute the sensitivity and specificity per output class.
def compute_class_sens_spec(pred, label, class_num):
"""
Compute sensitivity and specificity for a particular example
for a given class.
Args:
pred (np.array): binary arrary of predictions, shape is
(num classes, height, width, depth).
label (np.array): binary array of labels, shape is
(num classes, height, width, depth).
class_num (int): number between 0 - (num_classes -1) which says
which prediction class to compute statistics
for.
Returns:
sensitivity (float): for a given class_num.
specificity (float): for a given class_num.
"""
# extract sub-array for specified class
class_pred = pred[class_num]
class_label = label[class_num]
# compute:
# true positives
tp = np.sum(np.logical_and(class_pred == 1, class_label == 1))
# true negatives
tn = np.sum(np.logical_and(class_pred == 0, class_label == 0))
#false positives
fp = np.sum(np.logical_and(class_pred == 1, class_label == 0))
# false negatives
fn = np.sum(np.logical_and(class_pred == 0, class_label == 1))
# compute sensitivity and specificity
sensitivity = tp / (tp + fn)
specificity = tn / (tn + fp)
return sensitivity, specificity
### test
compute_class_sens_spec_test(compute_class_sens_spec)
sensitivity, specificity = compute_class_sens_spec(patch_pred[0], y, 2)
print(f"Sensitivity: {sensitivity:.4f}")
print(f"Specificity: {specificity:.4f}")
We can also display the sensitivity and specificity for each class.
def get_sens_spec_df(pred, label):
patch_metrics = pd.DataFrame(
columns = ['Edema',
'Non-Enhancing Tumor',
'Enhancing Tumor'],
index = ['Sensitivity',
'Specificity'])
for i, class_name in enumerate(patch_metrics.columns):
sens, spec = compute_class_sens_spec(pred, label, i)
patch_metrics.loc['Sensitivity', class_name] = round(sens,4)
patch_metrics.loc['Specificity', class_name] = round(spec,4)
return patch_metrics
df = get_sens_spec_df(patch_pred[0], y)
print(df)
Running on Entire Scans
As of now, our model just runs on patches, but what we really want to see is our model's result on a whole MRI scan.
- To do this, we generate patches for the scan.
- Then we run the model on the patches.
- Then combine the results together to get a fully labeled MR image.
The output of our model will be a 4D array with 3 probability values for each voxel in our data.
- We then can use a threshold (which you can find by a calibration process) to decide whether or not to report a label for each voxel.
We have a function that stitches the patches together: predict_and_viz(image, label, model, threshold)
- Inputs: an image, label and model.
- Ouputs: the model prediction over the whole image, and a visual of the ground truth and prediction.
image, label = load_case(DATA_DIR + "imagesTr/BRATS_003.nii.gz", DATA_DIR + "labelsTr/BRATS_003.nii.gz")
pred = util.predict_and_viz(image, label, model, .5, loc=(130, 130, 77))
Check how well the predictions do
We can see some of the discrepancies between the model and the ground truth visually.
- We can also use the functions we wrote previously to compute sensitivity and specificity for each class over the whole scan.
- First we need to format the label and prediction to match our functions expect.
whole_scan_label = keras.utils.to_categorical(label, num_classes = 4)
whole_scan_pred = pred
# move axis to match shape expected in functions
whole_scan_label = np.moveaxis(whole_scan_label, 3 ,0)[1:4]
whole_scan_pred = np.moveaxis(whole_scan_pred, 3, 0)[1:4]
Now we can compute sensitivity and specificity for each class just like before.
whole_scan_df = get_sens_spec_df(whole_scan_pred, whole_scan_label)
print(whole_scan_df)
Conclusion
In this project we will built a deep learning model to automatically segment tumor regions in brain using MRI (Magnetic Resonance Imaging) scans.
We looked at:
- What is in an MR image
- Standard data preparation techniques for MRI datasets
- Metrics and loss functions for segmentation
- Visualizing and evaluating segmentation models